Unsupervised Lexical Learning with Categorical Grammars Using the LLL Corpus
نویسندگان
چکیده
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories , the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilis-tically parsed. The parses and the history of previously parsed sentences are used to build a lexicon and annotate the corpus. We report the results from experiments on two generated corpora and also on the more complicated LLL corpus, that contain examples from subsets of the English language. These show that the system is able to generate reasonable lexicons and provide accurately parsed corpora in the process. We also discuss ways in which the approach can be scaled up to deal with larger and more diverse corpora.
منابع مشابه
Unsupervised Lexical Learning with Categorial Grammars using the LLL Corpus
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories , the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilis-tically parsed. The parses and the history of previously parsed se...
متن کاملFrom Grammar to Lexicon: Unsupervised Learning of Lexical Syntax
Imagine a language that is completely unfamiliar; the only means of studying it are an ordinary grammar book and a very large corpus of text. No dictionary is available. How can easily recognized, surface grammatical facts be used to extract from a corpus as much syntactic information as possible about individual words? This paper describes an approach based on two principles. First, rely on lo...
متن کاملOn the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars
We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments...
متن کاملLearning in the Lexical-Grammatical Interface
Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks. Doing so has led to unreasonable assumptions. Existing work in grammar induction presupposes a perfectly segmented, noise-free lexicon, while lexical learning approaches largely ignore h...
متن کاملUnsupervised Lexical Learning With Categorial Grammars
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories, the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilistically parsed. The parses and the history of previously parsed sent...
متن کامل